Evaluation of Word Vector Representations by Subspace Alignment

نویسندگان

  • Yulia Tsvetkov
  • Manaal Faruqui
  • Wang Ling
  • Guillaume Lample
  • Chris Dyer
چکیده

Unsupervisedly learned word vectors have proven to provide exceptionally effective features in many NLP tasks. Most common intrinsic evaluations of vector quality measure correlation with similarity judgments. However, these often correlate poorly with how well the learned representations perform as features in downstream evaluation tasks. We present QVEC—a computationally inexpensive intrinsic evaluation measure of the quality of word embeddings based on alignment to a matrix of features extracted from manually crafted lexical resources—that obtains strong correlation with performance of the vectors in a battery of downstream semantic evaluation tasks.1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Intrinsic Subspace Evaluation of Word Embedding Representations

We introduce a new methodology for intrinsic evaluation of word representations. Specifically, we identify four fundamental criteria based on the characteristics of natural language that pose difficulties to NLP systems; and develop tests that directly show whether or not representations contain the subspaces necessary to satisfy these criteria. Current intrinsic evaluations are mostly based on...

متن کامل

Geometry of Polysemy

Vector representations of words have heralded a transformational approach to classical problems in NLP; the most popular example is word2vec. However, a single vector does not suffice to model the polysemous nature of many (frequent) words, i.e., words with multiple meanings. In this paper, we propose a three-fold approach for unsupervised polysemy modeling: (a) context representations, (b) sen...

متن کامل

"The Sum of Its Parts": Joint Learning of Word and Phrase Representations with Autoencoders

Recently, there has been a lot of effort to represent words in continuous vector spaces. Those representations have been shown to capture both semantic and syntactic information about words. However, distributed representations of phrases remain a challenge. We introduce a novel model that jointly learns word vector representations and their summation. Word representations are learnt using the ...

متن کامل

Una aproximación al uso de word embeddings en una tarea de similitud de textos en español

In this paper we show how a vector representation of words based on word embeddings can help to improve the results in tasks focused on the semantic similarity of texts. Thus we have experimented with two methods that rely on the vector representation of words to calculate the degree of similarity of two texts, one based on the aggregation of vectors and the other one based on the calculation o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015